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1 Algorithmic selection of the best method for compressing map data strings 
E. L. Amidon, G. S. Akin 

December 1971 Communications of the ACM, Volume 14 issue 12 

Full text available: Q pdf(445.93 KB) Additional Information: full citation , abstract , references , citings 

The best of a dozen different methods for compressing map data is illustrated. The choices 
are generated by encoding data strings— sequence of like codes— by three methods and in 
four directions. Relationships are developed between compression alternatives to avoid 
comparing all of them. The technique has been used to compress data from forest resource 
maps, but is widely applicable to map and photographic data reduction. 

Keywords: data compression, data reduction, information retrieval, input/output, map 
storage, run coding 


Performance comparison of property map and bitmap indexing 
Ashima Gupta, Karen C. Davis, Jennifer Grommon-Litton 

November 2002 Proceedings of the 5th ACM international workshop on Data 
Warehousing and OLAP 

Full text available; ^ pdf(250.60 KB) Additional Information: full citation , abstract, references , index terms 

A data warehouse is a collection of data from different sources that supports analytical 
querying. A Bitmap Index (BI) allows fast access to individual attribute values that are 
needed to answer a query by representing the values of an attribute for all tuples 
separately, as bit strings. A Property Map (PMap) is a multidimensional indexing technique 
that pre-computes attribute expressions, called properties, for each tuple and stores the 
results as bit strings [DD97, LD02], This paper compares t ... 


Keywords: bitmap index, data warehouse, performance study 


Fast String Kernels using Inexact Matching for Protein Sequences 
Christina Leslie, Rui Kuang 

December 2004 The Journal of Machine Learning Research, Volume 5 
Full text available: ^ § pdf(347.79 KB) Additional Information: full citation , abstract 

We describe several families of /c-mer based string kernels related to the recently presented 
mismatch kernel and designed for use with support vector machines (SVMs) for 
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classification of protein sequence data. These new kernels restricted gappy kernels, 
substitution kernels, and wildcard kernels — are based on feature spaces indexed by k- 
length subsequences ("/c-mers") from the string alphabet I. However, for all kernels we 
define here, the kernel value K(x ... 

4 Fast detection of communication patterns in distributed executions | 
Thomas Kunz, Michiel F. H. Seuren 

November 1997 Proceedings of the 1997 conference of the Centre for Advanced Studies 
on Collaborative research 

Full text available: ^ pdf(4.21 MB) Additional Information: full citation , abstract , references , index terms 

Understanding distributed applications is a tedious and difficult task. Visualizations based on 
process-time diagrams are often used to obtain a better understanding of the execution of 
the application. The visualization tool we use is Poet, an event tracer developed at the 
University of Waterloo. However, these diagrams are often very complex and do not provide 
the user with the desired overview of the application. In our experience, such tools display 
repeated occurrences of non-trivial commun ... 

5 Integrating symbolic images into a multimedia database system using classification | 
and abstraction approaches 

Aya Soffer, Hanan Samet 

December 1998 The VLDB Journal — The International Journal on Very Large Data 

Bases, Volume 7 Issue 4 
Full text available: |j| pdf(227.30 KB) Additional Information: full citation , abstract , index terms 

Symbolic images are composed of a finite set of symbols that have a semantic meaning. 
Examples of symbolic images include maps (where the semantic meaning of the symbols is 
given in the legend), engineering drawings, and floor plans. Two approaches for supporting 
queries on symbolic-image databases that are based on image content are studied. The 
classification approach preprocesses all symbolic images and attaches a semantic 
classification and an associated certainty factor to each object that ... 

Keywords: Image indexing, Multimedia databases, Query optimization, Retrieval by 
content, Spatial databases, Symbolic-image databases 


Comparison of minisatellites 
Severine Berard, Eric Rivals 

April 2002 Proceedings of the sixth annual international conference on Computational 
biology 

Full text available: |g| pdf(2.48 MB) Additional Information: full citation , abstract , references , index terms 

In the class of repeated sequences that occur in DNA, minisatellites have been found 
polymorphic and became useful tools in genetic mapping and forensic studies. They consist 
of a heterogeneous tandem array of a short repeat unit. The slightly different units along 
the array are called variants. Minisatellites evolve mainly through tandem duplications and 
tandem deletions of variants. Jeffreys et al. devised a method to obtain the sequence of 
variants along the array in a digital code, and calle ... 

Keywords: alignment, bioinformatics, dynamic programming, evolution, minisatellite, 
overlap graphs, sequence comparison, tandem repeats 


7 A comparison of Chinese document indexing strategies and retrieval models 
Robert W. P. Luk, K. L. Kwok 

September 2002 ACM Transactions on Asian Language Information Processing (TALIP), 

Volume 1 Issue 3 
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Full text available: * g j| pdf(4 19.42 KB) Additional Information: full citation , abstract, references , index terms 

With the advent of the Internet and intranets, substantial interest is being shown in Asian 
language information retrieval; especially in Chinese, which is a good example of an Asian 
ideographic language (other examples include Japanese and Korean). Since, in this type of 
language, spaces do not delimit words, an important issue is which index terms should be 
extracted from documents. This issue also has wider implications for indexing other 
languages such as agglutinating languages (e.g., Finni ... 

Keywords: Chinese information retrieval, comparison, indexing strategies 


The FINITE STRING Newsletter: Abstracts of current literature 
Computational Linguistics Staff 

January 1987 Computational Linguistics, Volume 13 issue 1-2 

Full text available:^ Atr %m [fjj] 

Tg]pdf(6.15MB)g|P Additional Information: full citation 
Publisher Site 


9 String storage and searching for data base applications: Implementation on the INDY Q 
backend kernel 

George P. Copeland 

August 1978 Proceedings of the fourth workshop on Computer architecture for non- 
numeric processing 

Full text available: ^ pdf(854 23 KB) Additional Information: full citation , abstract , references , citings , index 

terms 

User and hardware cost trends dictate that data base systems should provide more 
complete functionality, simplicity of use, and reliability by increasing the amount of 
hardware present in the system. These goals are accomplished with a simple hardware 
arrangement within a one-dimensional cellular storage system called INDY. The INDY 
backend kernel is intended as a powerful tool for implementing all data models. The INDY 
cellular storage array is intended to provide functionality that is dif ... 

10 String storage and searching for data base applications: implementation on the INDY Q 
backend kernel 

George P. Copeland 

August 1978 , Volume 10 , 13 , 7 Issue 1,2,2 

Full text available: * g| pdf(986.51 KB) Additional Information: full citation , abstract , references 

User and hardware cost trends dictate that data base systems should provide more 
complete functionality, simplicity of use, and reliability by increasing the amount of 
hardware present in the system. These goals are accomplished with a simple hardware 
arrangement within a one-dimensional cellular storage system called INDY. The INDY 
backend kernel is intended as a powerful tool for implementing all data models. The INDY 
cellular storage array is intended to provide functionality that is difficul ... 

11 A guided tour to approximate string matching Q 
Gonzalo Navarro 

March 2001 ACM Computing Surveys (CSUR), Volume 33 issue l 

Full text available: fiC] pdf(1.19 MB) Additional Information: full citation , abstract , references , citings , index 

terms , review 

We survey the current techniques to cope with the problem of string matching that allows 
errors. This is becoming a more and more relevant issue for many fast growing areas such 
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as information retrieval and computational biology. We focus on online searching and 
mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its 
history and current developments, and the central ideas of the algorithms and their 
complexities. We present a number of experiments to ... 

Keywords: Levenshtein distance, edit distance, online string matching, text searching 
allowing errors 


12 Jump map-based interactive texture synthesis 
Steve Zelinka, Michael Garland 

October 2004 ACM Transactions on Graphics (TOG), Volume 23 issue 4 

Full text available: * Q pdf(529.89 KB) Additional Information: full citation , abstract , references , index terms 

We present techniques for accelerated texture synthesis from example images. The key 
idea of our approach is to divide the task into two phases: analysis, and synthesis. During 
the analysis phase, which is performed once per sample texture, we generate a <i>jump 
map</i>. Using the jump map, the synthesis phase is capable of synthesizing texture 
similar to the analyzed example at interactive rates. We describe two such synthesis phase 
algorithms: one for creating images, and one for di ... 

Keywords: Interactive texture synthesis, jump maps, texturing surfaces 

13 Identifying the semantic and textual differences between two versions of a program 
Susan Horwitz 

June 1990 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 1990 conference 

on Programming language design and implementation, Volume 25 issue 6 
Full text available: Spdf(1.27 MB) .Additional Information: full citation , abstract , references , citings , index 
^ terms 

Text-based file comparators (e.g., the Unix utility diff), are very general tools that can be 
applied to arbitrary files. However, using such tools to compare programs can be 
unsatisfactory because their only notion of change is based on program text rather than 
program behavior. This paper describes a technique for comparing two versions of a 
program, determining which program components r ... 

14 Programming by Refinement, as Exemplified bv the SETL Representation 
Sublanguage 

Robert K. Dewar, Arthur and Ssu-Cheng Liu and Jacob T. Schwartz and Edmond Schonberg 
January 1979 ACM Transactions on Programming Languages and Systems (TOPLAS), 

Volume 1 Issue 1 

Full text available: ^5pdf(1.49 MB) Additional Information: full citation , abstract , references , citings , index 
^ terms 

"Pure" SETL is a language of very high level allowing algorithms to be programmed rapidly 
and succintly. SETL's representation sublanguage adds a system of declarations which allow 
the user of the language to control the data structures that will be used to implement an 
algorithm which has already been written in pure SETL, so as to improve its efficiency. 
Ideally no rewriting of the algorithm should be necessary. The facilities provided by the 
representation sublanguage and the ... 

15 Comparison of the Programming Languages C and Pascal 
Alan R. Feuer, Narain H. Gehani 

January 1982 ACM Computing Surveys (CSUR), volume 14 issue l 

Full text available: ^pdf(1.75MB) Additional Information: full citation, references , citings , index terms 
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16 The string B-tree: a new data structure for string search in external memory and its 
applications 

Paolo Ferragina, Roberto Grossi 

March 1999 Journal of the ACM (JACM), volume 46 issue 2 

Full text available: ffi pdf(363.37 KB) Additional Information: full citation , abstract, references, citings, index 

terms 

We introduce a new text-indexing data structure, the String B-Tree, that can be seen as a 
link between some traditional external-memory and string-matching data structures. In a 
short phrase, it is a combination of B-trees and Patricia tries for internal-node indices that is 
made more effective by adding extra pointers to speed up search and update operations. 
Consequently, the String B-Tree overcomes the theoretical limitations of inverted files, B- 
trees, prefix B-trees, s ... 

Keywords: B-tree, Patricia trie, external-memory data structure, prefix and range search, 
string searching and sorting, suffix array, suffix tree, text index 


17 A mathematical approach to nondeterminism in data types Q 
Wim H. Hesselink 

January 1988 ACM Transactions on Programming Languages and Systems (TOPLAS), 

Volume 10 Issue 1 

Full text available: pdf(2.23 MB) Additional Information: full citation , abstract , references , citings , index 

terms , review 

The theory of abstract data types is generalized to the case of nondeterministic operations 
(set-valuedfunctions). Since the nondeterminism of operations may be coupled, signatures 
are extended so that operations can have results in Cartesian products. Input/output 
behavior is used to characterize implementation of one model by another. It is described by 
means of accumulated arrows, which form a generalization of the term algebra. Morphisms 
of nondeterministic models are introduced. Both i ... 

18 Status report of the graphic standards planning committee of ACM/SIGGRAPH: State- Q 
of-the-art of graphic software packages 

Compuater Graphics staff 

September 1977 ACM SIGGRAPH Computer Graphics, Volume 11 Issue 3 
Full text available: |g| pdf(9.03 MB) Additional Information: full citation , references 


A general-purpose compression scheme for large collections 

July 2002 ACM Transactions on Information Systems (TOIS), Volume 20 issue 3 

Full text available: 1 p) pdf(260 29 KB) Add ' tiona l Information: full citation, abstract , references , index terms . 

review 

Compression of large collections can lead to improvements in retrieval times by offsetting 
the CPU decompression costs with the cost of seeking and retrieving data from disk. We 
propose a semistatic phrase-based approach called xray that builds a model offline using 
sample training data extracted from a collection, and then compresses the entire collection 
online in a single pass. The particular benefits of xray are that it can be used in applications 
where individual records or documents must b ... 

Keywords: phrase-based compression, random access, sampling 
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Text classification using string kernels 

Huma Lodhi, Craig Saunders, John Shawe-Taylor, Nello Cristianini, Chris Watkins 
March 2002 The Journal of Machine Learning Research, Volume 2 

Full text available- f £]pdf(216 07 KB) Additional Information: full citation , abstract , references , citings , index 

terms 

We propose a novel approach for categorizing text documents based on the use of a special 
kernel. The kernel is an inner product in the feature space generated by all subsequences of 
length <em>k</em>. A subsequence is any ordered sequence of <em>k</em> characters 
occurring in the text though not necessarily contiguously. The subsequences are weighted 
by an exponentially decaying factor of their full length in the text, hence emphasising those 
occurrences that are close t ... 

Keywords: approximating kernels, kernels and support vector machines, string 
subsequence kernel, text classification 
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